ISSS608 Assignment 1 Part 1

This post is a submission for ISSS608 Assignment 1, Vast Challenge 2

Yeo Chia Guan Andy
07-25-2021

VAST Challenge 2

Content

1.0 Literature Review

For the literature review, a sample entry was taken from GitHub on a team who participated in the VAST Challenge 2014, Mini Challenge 2. The team of two was from City University London who performed their analysis using processing and libraries giCentre Utils and geoMap which were written by the giCentre at City University London.

The team used a map to indicate the coordinates of each vehicle at any point in time using animation. Locations outside of the norm were highlighted to find that there were five unknown locations visited by four security employees.

A possible gap to the for this study would be to identify the credit card owners by inferencing the vehicle dataset with the credit card dataset.

2.0 Insights

2.1 Credit Card and Loyalty Card Transactions

Using the loyalty card and credit card data, we first explore the popularity of shops by counting the number of visits by location. It was found that Katerina’s Cafe and Hippokampos were the top 2 most popular shops in terms of the transactions made over the period from 6 Jan to 19 Jan 2014.

2.1.1 Loyalty Card Usage

2.1.2 Credit Card Usage

2.2 Visitor per location by day

Next, we next plot the visitor count by day for each location to identify possible trends. It was also observed that at least 10 vehicles will visit Brew’ve Been Served every weekday.

On the credit card data, it was noticed that there was 1 visit to the Daily Dealz and 1 visit to U-pump which were out of the normal pattern. The visit to the Daily Dealz was not captured in the loyalty data, hence it can be inferred that the loyalty card might not be applicable for that store.

The data on loyalty card usage collects the date used but not the time, hence it is recommended that the time be collected as well to correctly match the credit card owner and the loyalty card owner to identify the card user.

2.2.1 Loyalty Card

2.2.2 Credit Card

Next we look at merging the credit card data with the loyalty card data to identify their relationship.As the exact transaction time was not recorded for the loyalty card, we draw relationship between the credit card and the loyalty card using the date, location and price variables. The left-join method was applied as payment is assumed compulsory by credit card for each transaction while the use of a loyalty card is optional, hence there might be credit card transactions without the use of a loyalty card which was apparent in the chart above as there were more visits recorded for the credit card data when compared to the loyalty card data. The credit card number should also be consistent with the loyalty card number as both should belong to the same owner.

The data was grouped by location in which Katerina’s Cafe was found the be the most popular location with 214 visitors, followed by Hippokampos with 173 visitors.

This was an anomaly as the credit card data only showed 212 visitors for Katerina’s Cafe and 171 for Hippokampos.

2.3 Total Visits per Store

To address the anomaly, we compare the Patronage dataset with the credit card data. We filter the transactions which were made using a credit card and loyalty card to identify the cause of the anomaly.

In this regard, we found that several credit card transactions had more than 1 loyalty card applied. We see that credit card numbers 1286,4795,4948,5368,5921,7889 and 8332 made transactions with more than 1 loyalty card.

This could imply that there was a meet-up between different employees of GasTech at those locations where a colleague might have used his loyalty card to obtain the benefits from the store.

# A tibble: 7 x 2
  `unique(last4ccnum)` UniqueLoyalty
  <chr>                        <int>
1 1286                             2
2 4795                             2
3 4948                             2
4 5368                             2
5 5921                             2
6 7889                             2
7 8332                             2

Using the interactive datatable, we could identify the transactions made with the above credit card numbers. It was found that the occurrences of using a different loyalty card were more than one.

3.0 Georeferencing

For the Mini Challenge, we were given a jpg file of the Akila, Kronos Map which we would use as the background for the locations. To use the geospatial data, we will need to georeference the points on the map image using GIF softwares such as QGIS. For this assignment, the QGIS software was used and to map the points of the shape file to the image file.

First, we were given the Abila.shp file which contained the map information which could be opened in the QGIS application as shown below.

Next, to make the map useful, we merge the map data to the given image data using georeferencer to indicate the points on the map to the coordinates on the image.

Finally, we apply the georeference to get the final output as shwon below.

Reading layer `Abila' from data source 
  `C:\yeochiaguan\DataViz\_posts\2021-07-27-isss608-assignment-1-part-1\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

4.0 Inferences

4.1 Merging Car and GPS Dataset

Next, we merge the gps and car datasets to which contains the vehicle data to identify owners and the trips made by each vehicle during the period between 6 Jan 2014 and 19 Jan 2014.

We are interested to know which employee visited the shops and made those transactions, however, available information on both datasets are very limited as we only have the timestamp to possibly match the vehicle dataset with the patronage dataset. Even the credit card dataset could only record to the minute, while gps data had a higher resolution to the second.

We start off by plotting the route taken by each employee for each day.

When we group the vehicle data by employment title, we could determine the normal route which each employee would take. From the chart below, we could see the following “unique” paths taken which were out of the norm. These paths can be identified by the weight of the lines which represents the frequency in which the path was taken.

S/N ID Current Employment Title Remarks
1 22 Badging Office 1 Trip through Ermou Street
2 20 Building Control 1 Trip through coffee Chameleon
3 26 Drill Site Manager 1 trip through Androutsu Street and 1 trip around Carnero Street
4 28 Drill Technician For id 28, data is not usable. For id 7, there was a trip near Spetson Street
5 3 Engineer 1 trip near Pilau Street
6 14 Engineering Group Manager 1 trip aound Carnero Street
7 35 Environmental Safety Adviser 1 trip near Parla Street and 1 trip near Arkadiou Street
8 29 Facilities Group Manager 1 trip past Hallowed Grounds and 1 trip past Carlyle Chemical Inc.
9 25 Geologist 1 trip past Alberts Fine Clothing
10 19 Hydraulic Technician 1 trip past Coffee Shack